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Abstract 

Learning of low dimensional structure in multidimensional data is a canoni¬ 
cal problem in machine learning. One common approach is to suppose that the 
observed data are close to a lower-dimensional smooth manifold. There are a 
rich variety of manifold learning methods available, which allow mapping of data 
points to the manifold. However, there is a clear lack of probabilistic methods 
that allow learning of the manifold along with the generative distribution of the 
observed data. The best attempt is the Gaussian process latent variable model 
(GP-LVM), but identifiability issues lead to poor performance. We solve these 
issues by proposing a novel Coulomb repulsive process (Corp) for locations of 
points on the manifold, inspired by physical models of electrostatic interactions 
among particles. Combining this process with a GP prior for the mapping function 
yields a novel electrostatic GP (electroGP) process. Focusing on the simple case 
of a one-dimensional manifold, we develop efficient inference algorithms, and il¬ 
lustrate substantially improved performance in a variety of experiments including 
filling in missing frames in video. 


1 Introduction 

There is broad interest in learning and exploiting lower-dimensional structure in high¬ 
dimensional data. A canonical case is when the low dimensional structure corresponds 
to a p-dimensional smooth Riemannian manifold A4 embedded in the d-dimensional 
ambient space y of the observed data p. Assuming that the observed data are close 
to A4, it becomes of substantial interest to learn A4 along with the mapping /i from 
A4 ^ y. This allows better data visualization and for one to exploit the lower¬ 
dimensional structure to combat the curse of dimensionality in developing efficient 
machine learning algorithms for a variety of tasks. 

The current literature on manifold learning focuses on estimating the coordinates 
X E M corresponding to y by optimization, finding x's on the manifold M. that pre¬ 
serve distances between the corresponding y's in y. There are many such methods, 
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including Isomap HI, locally-linear embedding O and Laplacian eigenmaps O. Such 
methods have seen broad use, but have some clear limitations relative to probabilistic 
manifold learning approaches, which allow explicit learning of Af, the mapping p and 
the distribution of y. 

There has been some considerable focus on probabilistic models, which would 
seem to allow learning of M and p. Two notable examples are mixtures of factor 
analyzers (MFA) Biia and Gaussian process latent variable models (GP-LVM) m 
Such approaches are useful in exploiting lower-dimensional structure in estimating the 
distribution of y, but unfortunately have critical problems in terms of reliable estima¬ 
tion of the manifold and mapping function. MFA is not smooth in approximating the 
manifold with a collage of lower dimensional hyper-planes, and hence we focus fur¬ 
ther discussion on GP-LVM. Similar problems occur for MFA and other probabilistic 
manifold learning methods. 

In general form for the ith data vector, GP-LVM lets yi = p{Xi) -\- Si, with p as¬ 
signed a Gaussian process prior, Xi generated from a pre-specified Gaussian or uniform 
distribution over ap-dimensional space, and the residual Ci drawn from a d-dimensional 
Gaussian centered on zero with diagonal or spherical covariance. While this model 
seems appropriate to manifold learning, identifiability problems lead to extremely poor 
performance in estimating M. and p. To give an intuition for the root cause of the 
problem, consider the case in which Xi are drawn independently from a uniform dis¬ 
tribution over [0,1]^. The model is so flexible that we could fit the training data yi, 
for i = 1,..., n, just as well if we did not use the entire hypercube but just placed 
all the Xi values in a small subset of [0,1]^. The uniform prior will not discourage 
this tendency to not spread out the latent coordinates, which unfortunately has disas- 
terous consequences illustrated in our experiments. The structure of the model is just 
too flexible, and further constraints are needed. Replacing the uniform with a standard 
Gaussian does not solve the problem. 

To make the problem more tractable, we focus on the case in which Af is a one¬ 
dimensional smooth compact manifold. Assume yi = + e^, with Gaussian 

noise, and : (0,1) ^ Af a smooth mapping such that pjf) e for j = 1,..., d, 
where /i(x) = (/ii(x),..., p.d{x)). We focus on finding a good estimate of p, and 
hence the manifold, via a probabilistic learning framework. We refer to this problem 
as probabilistic curve learning (PCL) motivated by the principal curve literature (71 . 
PCL differs substantially from the principal curve learning problem, which seeks to 
estimate a non-linear curve through the data, which may be very different from the true 
manifold. 

Our proposed approach builds on GP-LVM; in particular, our primary innovation 
is to generate the latent coordinates Xi from a novel repulsive process. There is an 
interesting literature on repulsive point process modeling ranging from various Matern 
processes O to the determinantal point process (DPP) 13. In our very different con¬ 
text, these processes lead to unnecessary complexity — computationally and otherwise 
— and we propose a new Coulomb repulsive process (Corp) motivated by Coulomb’s 
law of electrostatic interaction between electrically charged particles. Using Corp for 
the latent positions has the effect of strongly favoring spread out locations on the man¬ 
ifold, effectively solving the identifiability problem mentioned above for the GP-LVM. 
We refer to the GP with Corp on the latent positions as an electrostatic GP (electroGP). 
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The remainder of the paper is organized as follows. The Coulomb repulsive process 
is proposed in § |^ and the electroGP is presented in § [^ with a comparison between 
electroGP and GP-LVM demonstrated via simulations. The performance is further 
evaluated via real world datasets in § |^ A discussion is reported in § [^ 


2 Coulomb repulsive process 


2.1 Formulation 

Definition 1. A univariate process is a Coulomb repulsive process (Corp) if and only 
if for every finite set of indices G,..., t/. m the index set N+, 


Xti - unif{0, 1), 

( 1 ) 

where r > 0 A the repulsive parameter. The process is denoted as Xf ~ Corp(r). 

The process is named by its analogy in electrostatic physics where by Coulomb 
law, two electrostatic positive charges will repel each other by a force proportional to 
the reciprocal of their square distance. Letting d{x^y) = sin |7rx — 7r^|, the above 
conditional probability of Xt. given Xt^ is proportional to d?^{Xt-^Xt.), shrinking 
the probability exponentially fast as two states get closer to each other. Note that the 
periodicity of the sine function eliminates the edges of [0,1]. making the electrostatic 
energy field homogeneous everywhere on [0, !]• 

Several observations related to Kolmogorov extension theorem can be made imme¬ 
diately, ensuring Corp to be well defined. Firstly, the conditional density defined in O 
is positive and integrable, since X^’s are constrained in a compact interval, and sin^’^(-) 
is positive and bounded. Hence, the finite distributions are well defined. 

Secondly, the joint finite p.d.f. for Xt ^,..., Xt^ can be derived as 

p{Xt, Xt sin"’- (TrXt, - nXt^). (2) 


As can be easily seen, any permutation of ti,..., t/c will result in the same joint finite 
distribution, hence this finite distribution is exchangeable. 

Thirdly, it can be easily checked that for any finite set of indices G, • • •, G+m, 


‘ ‘ ‘ ^tk) f ••• f ‘ ‘ ^tkl ^tk+l^ ‘ ‘ ‘ ^ ^tk + m)^^tk+l ' ‘ ‘ ^^tk + n 

JO JO 


by observing that 




•) ^tk') ^tk+l ^tk + m) pistil 






2.2 Properties 

Assuming X^, t g N+ is a realization from Corp, then the following lemmas hold. 
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Figure 1: Each facet consists of 5 rows, with each row representing an 1-dimensional 
scatterplot of a random realization of Corp under certain n and r. 


Lemma 1. For any n e N+, any 1 <n and any e > 0, have 

27 r 2 (^ 2 r + l 

where B{Xi, e) = {X e (0,1) : d{X - Xi) < e). 

Lemma 2. For any n 6 N+, the p.d.f. 0 of Xi^^ Xn (due to the exchangeability, 
we can assume Xi < X 2 < • • • < X^ without loss of generality) is maximized when 
and only when 


diXi — Xi_i) = sin (—-—) for all 2 ^ i ^ n. 

According to Lemma and Lemma Corp will nudge the x’s to be spread out 
within [0,1], and penalizes the case when two x’s get too close. Figure [^presents some 
simulations from Corp. This nudge becomes stronger as the sample size n grows, or as 
the repulsive parameter r grows. The properties of Corp makes it ideal for strongly fa¬ 
voring spread out latent positions across the manifold, avoiding the gaps and clustering 
in small regions that plague GP-LVM-type methods. The proofs for the lemmas and a 
simulation algorithm based on rejection sampling can be found in the supplement. 


2.3 Multivariate Corp 

Definition 2. A p-dimensional multivariate process is a Coulomb repulsive process if 
and only if for every finite set of indices ti,... ,1^ in the index set N+, 


^m,ti ~ unif{0, l),form = l,...,p 

■ P + 1 


2 


m=l 
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where the p-dimensional spherical coordinates Xt’s have been converted into the {p + 
1)-dimensional Cartesian coordinates Yt: 

Yi^t = cos(27rXi,t) 

Y2,t = sin(27rXi,t) cos(27rX2,t) 


Yp^t = sin(27rXi,t) sin(27rX2,t)... sin(27rXp_i,t) cos(27rXp,t) 
yp+i,t = sin(27rXi,t) sin(27rX2,t)... sin(27rXp_i,t) sin(27rXp,t)- 

The multivariate Corp maps the hyper-cubic (0,1)^ through a spherical coordinate 
system to a unit hyper-ball in The repulsion is then defined as the reciprocal 

of the square Euclidean distances between these mapped points in Based on 

this construction of multivariate Corp, a straightfoward generalization of the electroGP 
model to a p-dimensional manifold could be made, where p > 1. 


3 Electrostatic Gaussian Process 

3.1 Formulation and Model Fitting 

In this section, we propose the electrostatic Gaussian process (electroGP) model. As¬ 
suming n d-dimensional data vectors ... ,Pn are observed, the model is given by 

ViJ = fljixi) + €ij, €ij - ^( 0 , ( t |), 

Xi ~ Corp(r), i = 1,..., n, (3) 

Pj -- gv{o,K^), j = 

where yi = ..., fori = 1,..., n and ^7^(0, K^) denotes a Gaussian pro¬ 

cess prior with covariance function {x^ y) = (j)j exp { — aj{x — 

Letting 0 = {af , o^i, ,..., crj, , (pd) denote the model hyperparameters, model 
^ could be fitted by maximizing the joint posterior distribution of a: = (xi,... .Xn) 
and 0, 

(i, 0) = argmaxp(a:|?/i:n,0,r), (4) 

x.Q 

where the repulsive parameter r is fixed and can be tuned using cross validation. Based 
on our experience, setting r = 1 always yields good results, and hence is used as a 
default across this paper. For the simplicity of notations, r is excluded in the remainder. 
The above optimization problem can be rewritten as 

(x, 0) = argmaxZ(j/i:„|x, 0) + log [7r(x)], 

x.Q 

where /(•) denotes the log likelihood function and 7r(-) denotes the finite dimensional 
pdf of Corp. Hence the Corp prior can also be viewed as a repulsive constraint in the 
optimization problem. 
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It can be easily checked that log [7r{xi = Xj)^ = —go, for any i and j. Starting at 
initial values xq, the optimizer will converge to a local solution that maintains the same 
order as the initial xq’s. We refer to this as the self-truncation property. We find that 
conditionally on the starting order, the optimization algorithm converges rapidly and 
yields stable results. Although the x’s are not identifiable, since the target function 0 
is invariant under rotation, a unique solution does exist conditionally on the specified 
order. 

Self-truncation raises the necessity of finding good initial values, or at least a good 
initial ordering for x’s. Fortunately, in our experience, simply applying any standard 
manifold learning algorithm to estimate xq in a manner that preserves distances in y 
yields good performance. We find very similar results using LLE, Isomap and eigen- 
map, but focus on LLE in all our implementations. Our algorithm can be summarized 
as follows. 

1. Learn the one dimensional coordinate Xq by your favorite distance-preserving 
manifold learning algorithm and rescale into ( 0 , 1 ); 

2. Solve ©0 = argmax 0 p(|/i:n|a:o, ©, r) using scaled conjugate gradient descent 
(SCO); 

3. Using SCG, setting Xq and ©o to be the initial values, solve x and 0 w.r.t. (|^. 

3.2 Posterior Mean Curve and Uncertainty Bands 

In this subsection, we describe how to obtain a point estimate of the curve p and how to 
characterize its uncertainty under electroGP. Such point and interval estimation is as of 
yet unsolved in the literature, and is of critical importance. In particular, it is difficult 
to interpret a single point estimate without some quantification of how uncertain that 
estimate is. We use the posterior mean curve p = E{p\x, yi:n^ ©) as the Bayes optimal 
estimator under squared error loss. As a curve, p has infinite dimensions. Hence, in 
order to store and visualize it, we discretize [ 0 , 1 ] to obtain equally-spaced grid 
points x^ = for i = 1,..., Using basic multivariate Gaussian theory, the 

following expectation is easy to compute. 

e). 

Then p is approximated by linear interpolation using For ease of 

notation, we use p to denote this interpolated piecewise linear curve later on. Examples 
can be found in Eigurej^ where all the mean curves (black solid) were obtained using 
the above method. 

Estimating an uncertainty region including data points with p probability is much 
more challenging. We addressed this problem by the following heuristic algorithm. 

Step 1. Draw xf's from Unif(0,l) independently for i = 1,..., ni; 

Step 2. Sample the corresponding yf from the posterior predictive distribution condi¬ 
tional on these latent coordinates p(?/*,..., ©); 
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Figure 2: Visualization of three simulation experiments where the data (triangles) are 
simulated from a bivariate Gaussian (left), a rotated parabola with Gaussian noises 
(middle) and a spiral with Gaussian noises (right). The dotted shading denotes the 
95% posterior predictive uncertainty band of {yi , ^ 2 ) under electroGR The black curve 
denotes the posterior mean curve under electroGP and the red curve denotes the P- 
curve. The three dashed curves denote three realizations from GP-LVM. The middle 
panel shows a zoom-in region and the full figure is shown in the embedded box. 


Step 3. Repeat steps 1-2 77,2 times, collecting all ni x 77,2 samples |/*’s; 

Step 4. Find the shortest distances from these |/*’s to the posterior mean curve fi, and 
find the r^-quantile of these distances denoted by p\ 

Step 5. Moving a radius-p ball through the entire curve /i([0,1]), the envelope of the 
moving trace defines the r]% uncertainty band. 

Note that step 4 can be easily solved since /i is a piecewise linear curve. Examples can 
be found in Figure where the 95% uncertainty bands (dotted shading) were found 
using the above algorithm. 


3.3 Simulation 

In this subsection, we compare the performance of electroGP with GP-LVM and prin¬ 
cipal curves (P-curve) in simple simulation experiments. 100 data points were sampled 
from each of the following three 2-dimensional distributions: a Gaussian distribution, 
a rotated parabola with Gaussian noises and a spiral with Gaussian noises. ElectroGP 
and GP-LVM were fitted using the same initial values obtained from LLE, and the 
P-Curve was fitted using the princurve package in R. 

The performance of the three methods is compared in Figure]^ The dotted shading 
represents a 95% posterior predictive uncertainty band for a new data point yn+i under 
the electroGP model. This illustrates that electroGP obtains an excellent fit to the data, 
provides a good characterization of uncertainty, and accurately captures the concentra¬ 
tion near a Id manifold embedded in two dimensions. The P-curve is plotted in red. 
The extremely poor representation of P-curve is as expected based on our experience 
in fitting principal curve in a wide variety of cases; the behavior is highly unstable. In 
the first two cases, the P-Curve corresponds to a smooth curve through the center of the 
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Figure 3: The zoom-in of the spiral case 3 (left) and the corresponding coordinate 
function, of electroGP (middle) and GP-LVM (right). The gray shading de¬ 

notes the heatmap of the posterior distribution of (x, ^ 2 ) and the black curve denotes 
the posterior mean. 


data, but for the more complex manifold in the third case, the P-Curve is an extremely 
poor representation. This tendency to cut across large regions of near zero data density 
for highly curved manifolds is common for P-Curve. 

For GP-LVM, we show three random realizations (dashed) from the posterior in 
each case. It is clear the results are completely unreliable, with the tendency being to 
place part of the curve through where the data have high density, while also erratically 
adding extra outside the range of the data. The GP-LVM model does not appropri¬ 
ately penalize such extra parts, and the very poor performance shown in the top right 
of Figure is not unusual. We find that electroGP in general performs dramatically 
better than competitors. More simulation results can be found in the supplement. To 
better illustrate the results for the spiral case 3, we zoom in and present some further 
comparisons of GP-LVM and electroGP in Figure 

As can be seen the right panel, optimizing x’s without any constraint results in 
“holes” on [0,1]. The trajectories of the Gaussian process over these holes will be¬ 
come arbitrary, as illustrated by the three realizations. This arbitrariness will be further 
projected into the input space y, resulting in the erratic curve observed in the left panel. 
Failing to have well spread out x’s over [0,1] not only causes trouble in learning the 
curve, but also makes the posterior predictive distribution of yn+i overly diffuse near 
these holes, e.g., the large gray shading area in the right panel. The middle panel 
shows that electroGP fills in these holes by softly constraining the latent coordinates 
x’s to spread out while still allowing the flexibility of moving them around to find a 
smooth curve snaking through them. 

3.4 Prediction 

Broad prediction problems can be formulated as the following missing data problem. 
Assume m new data Zi, for i = 1,... ,m, are partially observed and the missing 
entries are to be filled in. Letting zf denote the observed data vector and zf^ denote 
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the missing part, the conditional distribution of the missing data is given by 

X x, e)da;f • ■ ■ da;^, 

where xf is the corresponding latent coordinate of 2 ^, for i = 1,... ,n. However, 
dealing with (xf,..., x^) jointly is intractable due to the high non-linearity of the 
Gaussian process, which motivates the following approximation, 

P(a^l:ml-Z°m>*>2/l:n>0) ^ ^yi-n, &) ■ 

The approximation assumes (xf,..., x^) to be conditionally independent. This as¬ 
sumption is more accurate if x is well spread out on (0,1), as is favored by Corp. 

The univariate distribution p{xf\xf,yi:n,u^ 0), though still intractable, is much 
easier to deal with. Depending on the purpose of the application, either a Metropo¬ 
lis Hasting algorithm could be adopted to sample from the predictive distribution, or 
a optimization method could be used to find the MAP of x^'s. The details of both 
algorithms can be found in the supplement. 

4 Experiments 

Video-inpainting 200 consecutive frames (of size 76 x 101 with RGB color) ifTOl 
were collected from a video of a teapot rotating 180°. Clearly these images roughly lie 
on a curve. 190 of the frames were assumed to be fully observed in the natural time 
order of the video, while the other 10 frames were given without any ordering infor¬ 
mation. Moreover, half of the pixels of these 10 frames were missing. The electroGP 
was fitted based on the other 190 images and was used to reconstruct the broken frames 
and impute the reconstructed frames into the whole frame series with the correct order. 
The reconstruction results are presented in Figure]^ As can be seen, the reconstructed 
images are almost indistinguishable from the original ones. Note that these 10 frames 
were also correctly imputed into the video with respect to their latent position x's. 
ElectroGP was compared with Bayesian GP-LVM llTT]| with the latent dimension set to 
1. The reconstruction mean square error (MSB) using electroGP is 70.62, compared to 
450.75 using GP-LVM. The comparison is also presented in Figure]^ It can be seen 
that electroGP outperforms Bayesian GP-LVM in high-resolution precision (e.g., how 
well they reconstructed the handle of the teapot) since it obtains a much tighter and 
more precise estimate of the manifold. 

Super-resolution & Denoising 100 consecutive frames (of size 100 x 100 with 
gray color) were collected from a video of a shrinking shockwave. Frame 51 to 55 were 
assumed completely missing and the other 95 frames were observed with the original 
time order with strong white noises. The shockwave is homogeneous in all directions 
from the center; hence, the frames roughly lie on a curve. The electroGP was applied 
for two tasks: 1. Frame denoising; 2. Improving resolution by interpolating frames 
in between the existing frames. Note that the second task is hard since there are 5 
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Figure 4: Left Panel: Three randomly selected reconstructions using electroGP com¬ 
pared with those using Bayesian GP-LVM; Right Panel: Another three reconstructions 
from electroGP, with the first row presenting the original images, the second row pre¬ 
senting the observed images and the third row presenting the reconstructions. 


consecutive frames missing and they can be interpolated only if the electroGP correctly 
learns the underlying manifold. 

The denoising performance was compared with non-local mean filter (NLM) 
and isotropic diffusion (IsD) ca. The interpolation performance was compared with 
linear interpolation (LI). The comparison is presented in Figure As can be clearly 
seen, electroGP greatly outperforms other methods since it correctly learned this one¬ 
dimensional manifold. To be specific, the denoising MSE using electroGP is only 
1.8 X 10“^, comparing to 63.37 using NLM and 61.79 using IsD. The MSE of recon¬ 
structing the entirely missing frame 53 using electroGP is 2 x 10“^ compared to 13 
using LI. An online video of the super-resolution result using electroGP can be found 
in this lin^ The frame per second (fps) of the generated video under electroGP was 
tripled compared to the original one. Though over two thirds of the frames are pure 
generations from electroGP, this new video fiows quite smoothly. Another noticeable 
thing is that the 5 missing frames were perfectly regenerated by electroGP. 

5 Discussion 

Manifold learning has dramatic importance in many applications where high-dimensional 
data are collected with unknown low dimensional manifold structure. While most of 
the methods focus on finding lower dimensional summaries or characterizing the joint 
distribution of the data, there is (to our knowledge) no reliable method for probabilistic 
learning of the manifold. This turns out to be a daunting problem due to major is¬ 
sues with identifiability leading to unstable and generally poor performance for current 
probabilistic non-linear dimensionality reduction methods. It is not obvious how to 
incorporate appropriate geometric constraints to ensure identifiability of the manifold 
without also enforcing overly-restrictive assumptions about its form. 

We tackled this problem in the one-dimensional manifold (curve) case and built 
a novel electrostatic Gaussian process model based on the general framework of GP- 
LVM by introducing a novel Coulomb repulsive process. Both simulations and real 

^ https://youtu.be/NlBG220J5Js This online video contains no information regarding the authors. 
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Figure 5: Row 1: From left to right are the original 95th frame, its noisy observation, 
its denoised result by electroGP, NLM and IsD; Row 2: From left to right are the 
original 53th frame, its regeneration by electroGP, the residual image (10 times of the 
absolute error between the imputation and the original) of electroGP and LI. The blank 
area denotes its missing observation. 


world data experiments showed excellent performance of the proposed model in accu¬ 
rately estimating the manifold while characterizing uncertainty. Indeed, performance 
gains relative to competitors were dramatic. The proposed electroGP is shown to be 
applicable to many learning problems including video-inpainting, super-resolution and 
video-denoising. There are many interesting areas for future study including the devel¬ 
opment of efficient algorithms for applying the model for multidimensional manifolds, 
while learning the dimension. 
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Appendix 


Proof of Lemma 1 and Lemma 2 

Lemma 1 

Proof. By definition of the Corp, for e small enough, we have 

p{XneB{Xi,e)\Xi,...,Xn-i) 


=C 


J, 

asiE 

f 


^ ^ npi sin^" (7rX„ - 7rX,)dX„ 

[X„-Xi)<e 

npi sin^'- (7rX„ - 7rX,)dX„, 


J\X,,-Xi\<e 

where C is the normalizing constant. When Xi e (e, 1 — e), the following is true, 

rXi + e 


mzl sin^^ {xXn - xXj)dXr^ 

JXi-e 


< 


=2 


r ‘ sin^’-(7rX„ - 7rXi)dX„ 

JXi-e 

r sin^’^(7rx)dx 

Jo 

’/■ 


<2 I x‘^^dx 

Jo 

2^2e2r+l 


2 r + l 

When Xi g (0, e), the following is true. 


sin^^ (jrXn - 7rXj)dXr, 


< 


rXi + e 

Jo Jl-e+Xi 

rXi + e pi 

J sin^'- (xX„ - 7rXi)dX„ + J 


sm TT 


— 7rXn + 7rX^)dX„ 


= ^ J + J ^ sin^’^(7rx)dx + J sin^’^(7rx)dx 


< 


271 e 


2^2r+l 


2 r + 1 

The proof for g (1 — e, 1) is the same as above and hence is neglected here. □ 
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Lemma 2 


Proof. The log of the density is given up to a constant by 


;(Xi,...,x„)ay;ciog 


i>j 


sin^ [iiXi — TrXj) 


The first order derivatives are given by 


dl ^ 2c7rsin (ttX^ — ttXj^ cos (ttX^ — ttXj^ 

* J^^ 


{nX, - nXj) 


= ^ 2c7r cot {jrXi — TiXj) 




(5) 


For any Xi < X 2 < ■' < X^ satisfying condition d{Xi — Xi_i) = sin (^^), ^ 
can be rewritten as 


^ 2c7rcot {jiXi — 'kXj) 


= ^ 2c7r cot 


i—l 




i=i 

i—l 


= ^ 2c7rcot ( —TTI + 2 2c7rcot ( — 


j=i+l 


3 


j=i 

n—1 


= ^ 2c7r cot ( —TT j + 2 2c7r cot 


j=i+i 


n — j + i 


-TT 


n 


= y; 2c7rcot K 


J=1 

=0 


Hence Q satisfies the first order condition. The second order derivatives are given by 


dH 


dXidXj 


= 2c7r- 


cot {11 Xi — TrXj) 


dX. 


for i ^ j and 


dH 

dx} 


= £ 2c7r- 


cot (jiXi — irXj ) 




ax, 




d^l 


dXidX. 
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Hence the Hessian matrix is positive semi-definite, indicating that ([^ is a global max¬ 
ima. Note also that the Hessian matrix is rank-deficit, indicating that the solution to 
this maximization problem is not unique. □ 


Sampling from Corp 

The sampling method can be easily summarized as. 

Step 1 Sample Xi from Unif(0,l); 

Step 2 Repeatedly sample Xi fTomp{Xi\Xi ,..., Xi-i) until desired sample size reached. 
The difficulty arises in step 2 since 

sin^’- (tt^^ - 

is multi-modal and not analytically integrable. Fortunately, sampling from the above 
univariate distribution can be done by rejection sampling. The only trick here is to 
find a proper proposal distribution. Naively using a uniform would result in very high 
rejection rate as i grows larger. 

Assuming without loss of generosity that Xi < X 2 < ... < it can be 

easily checked that there is one local mode within each interval of A^+i), for 
We denote the mode by pj and the interval by Sj . There is also one 
mode on (0, Xi) |J(X^_i, 1). We denote this mode by Pi-i, and this interval by 
Sampling from this conditional distribution can be summarized as. 

Step 1 Sample k from Multinomial^(a) where aj = p{Xi\Xi ,... ,Xi_i)dX^ for 
j = — 1. These integration is done using numerical method; 

Step 2 Use Unif(5'/c) as the proposal distribution and calculate pk using numerical max¬ 
imization method. Use rejection sampling to sample Xi from the truncated con¬ 
ditional distribution ps^ {Xi\Xi ^..., X^_i). 


Prediction 

Assume m new data Zi, for i = 1,... ,m, are partially observed and the missing 
entries are to be predicted. Letting zf denote the observed data vector and zf- denote 
the missing part. We approximate the predictive distribution by assuming that these 
Zi's are conditionally independent. For ease of notation, we focus on discussing the 
prediction algorithm for one partially observed new data vector ( 2 :^, z^). 

Sample from the posterior predictive distribution Instead of sampling from 
p{z^\z^,x,yi:n, 0), we sample fromp{z^,x^\z^,x,yi:n, 0), which can be factor¬ 
ized into two parts p{z^\x^ ^ z^ ^x^yi^n^ 0) and p{x^\z^ ^x^yi:n^ 0). The first part 
is simply a conditional Gaussian distribution and can be easily sampled. We use 
the Metropolis Hasting algorithm to sample from the intractable second part, using 
Unif(0,l) as the proposal distribution. Note that Unif(0,l) is a natural choice, since 
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it is the prior distribution of x. It can be easily generalized to a piecewise uniform 
distribution, as what we did in sampling Corp, to decrease the rejection rate. 

Find the MAP MCMC can be infeasible in some applications due to its expensive 
computation. A straightforward solution is to use EM algorithm treating as an 
augmented variable, which will give us a point estimate of . We propose another 
heuristic algorithm that would give us instead of point estimate a distribution of . 
The algorithm is very simple and is summarized as follows. 

Step 1. Find x^ by maximizing p{x^\z^Vi-.m 0)^ 

Step 2. Return p{z^ \x^ ,z^ ,x 5 yi:ni 0). which is simply a multivariate Gaussian. 


Two More Simulation Results 



Figure 6: Left: Rotated sine curve with Gaussian noises; Right: Arc with Gaussian 
noises. 
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